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Abstract — Battery powered transmitters face energy 
constraint, replenisliing tlieir energy by a renewable energy 
source (like solar or wind power) can lead to longer 
lifetime. We consider here the problem of finding the 
optimal power allocation under random channel conditions 
for a wireless transmitter, such that rate of information 
transfer is maximized. Here a rechargeable battery, which 
is periodically charged by renewable source, is used to 
power the transmitter. All of above is formulated as a 
Markov Decision Process. Structural properties like the 
monotonicity of the optimal value and poUcy derived in 
this paper will be of vital importance in understanding the 
kind of algorithms and approximations needed in real-life 
scenarios. The effect of curse of dimensionality which is 
prevalent in Dynamic programming problems can thus be 
reduced. We show our results under the most general of 
assumptions. 

Index Terms — Optimal reward function. Monotone op- 
timal policy. Concavity, Stochastic domination. 

I. Introduction 

AS we move towards hand-held devices that use 
wireless transmitters, there is an exceeding need to 
prolong the lifetime of their batteries without having to 
manually recharge them on a regular basis. One natural 
solution to such a problem is to utilize the environment, 
i.e., have a renewable energy source recharge the battery 
periodically. This will enable the system to be self- 
sustaining. List of renewable energy sources include 
solar power, wind energy, geothermal energy and ocean 
energy (tidal and wave). Our objective here is to max- 
imize the throughput of a wireless transmitter enabled 
with renewable energy source. (A lot of work in this 
regard has also been done to optimize the performance 
of the battery powered sensor (see Chang |1|, Hou Q) 
and also in field of energy-harvesting (see Yasser Q). 
A recent paper has experimentally shown it possible to 
power a remote sensor via magnetic resonance without 
being in contact with the sensor, see Kurs |4|) 

The renewable sources of energy are better modelled 
as random sources due to the lack of control that we have 
over the source (for example in wind energy, speed of the 



winds is not in our control). Thus the key challenges we 
face are on account of having randomness in recharge 
energy from the renewable source and randomness in 
channel state. Also since we have a battery, the maxi- 
mum energy that can be stored at any point of time is 
limited. This is quite different in contrast to having a 
constraint only in terms of average power used. There 
could be a case for not operating at energy levels close to 
maximum lest added energy could go to waste. Whereas 
randomness in channel state could see the optimal policy 
conserving energy while waiting for a better channel to 
come. We hope to answer for such trade-off in this paper. 

We model the problem of maximizing throughput 
of renewable energy empowered wireless transmitter as 
an infinite horizon discounted reward Markov Decision 
Process (MDP). We will use the reward function (J*), 
which represents the overall throughput, to compare 
policies. Optimal policies for us would mean deciding 
on what power to allocate for every possible value 
of battery state and channel state (defined together as 
states) so as to obtain maximum overall reward (J*) for 
every state. Generally MDP or dynamic programming 
solutions follow the "Curse of dimensionality", because 
the state space tends to be exponential in one or more 
system parameter. That is the case in our problem as 
well. Higher complexity solutions are not preferred as 
it would become a nightmare to implement it. In such 
a case, having some kind of structure on the solution 
will have big advantage implementation-wise, not to 
mention having more analytic tractability of the problem. 
Our contribution here is to prove the non-decreasing 
nature of the optimal policy w.rt states. Our proofs 
rely only on standard results and techniques used in 
MDP's. Monotonicity in optimal policy is also important 
as it tells us about how the structure of the system is 
impervious to various situations like having different 
probability distributions on channel state and recharge 
energy. Once we have proven non-decreasing optimal 
policy, the search space automatically reduces. Moreover 
on the basis of this we can also try to get the threshold 
behaviour (approximately if suitable) which will give us 
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chance to make the implementation in real-time. 

As far as structural properties go, mono tonicity for the 
optimal policy is one of the most basic results. Hence 
there has been a plethora of work on the matter. One of 
the earliest method to prove monotonicity was provided 
by Serfozo |5]. In his book f6], Martin Puterman has 
provided sufficient conditions for the same as well, here 
however we approach the problem in a different manner 
(we show results based on properties of J* rather than 
the Transition Probability Matrix). There has also been 
a lot of work on optimal policy for rechargeable sensors 
but with different considerations, in |7] we can find a 
policy which not only takes into account the rate of 
information transfer but also actual throughput for the 
queued data. Similarly, in [8], the authors have dealt 
with the finite horizon equivalent and have given an on- 
line policy which can guarantee fraction of the optimal 
throughput. 

After defining the problem we set up the equations 
for finding the solution in section [Tl] In section III 
we begin by proving results about monotonicity (non- 
decreasing) and concavity of J* and then move on to 
our main result where we prove that the Optimal Power 
Allocation function is non-decreasing. Once we have our 
main structural result, we talk of possible generalizations 



from this framework. In section IV we present simulation 
results for verification of our result as well as to look at 
the effects of varying system parameters and conclude 
by noting some of the work that is being taken up. 

II. Formalism 

A. System definition 

We consider a system consisting of one receiver and 
one transmitter with a wireless channel for commu- 
nication. Moreover fading channel has been assumed. 
For a fading wireless channel, the maximum rate of 
information transfer i.e. capacity of the channel (due to 
Shannon 0) is 

Ph 

C = log(l + SNR) SNR = — — 

1\Q W 

here P is the transmitted power, h is the channel-fade 
coefficient and A^o^^ is the noise spectral density (SNR 
thus is the signal-to-noise ratio). The channel-fade coeffi- 
cient, h ^ Ti = {ei, 62, . . . , Cat} according to the known 
probability distribution Pff (•)• assume a memory less 
channel and H represents the set of possible channel 
states, where < ej for i < j. On the transmitter side, 
power is provided by a rechargeable battery which has 
finite capacity to store energy (this could be the model 
for remote sensors placed in obscure areas which can 
be recharged periodically using only renewable sources 



like wind and solar energy and which will have a limited 
capacity to store energy). Our main aim is to find the 
optimal power allocation policy for this system, which 
will tell us the rule by which power is to be used for 
data transmission in terms of the other parameters of 
the system so as to get maximum rate of information 
transfer. Time is considered to be slotted and we also 
assume full channel-side information (CSI). So we have 
perfect channel state information before transmission in 
every slot. 

Let the energy in the battery at the beginning of the 
time slot be ^„ and power allocated in the slot be 



n 



th 



Pfi (energy per slot). We will use the random variable 
Xn to model the amount of recharging energy added 
to the battery at the end of n*'* slot by the renewable 
source. Note that the process {Xn}n>i is assumed to 
be i.i.d. and random variable has a finite support in the 
set {0, 1, ... , a}. All our variables are over non-negative 
integers. (For example in Solar energy refer to [10] for 
the model relating to the exact distribution on X). Using 
these we can write our system equation 



^„,+i = min - P„)+ + Xn, Cm) 



(1) 



(x)^ = max{x, 0} and here is maximum energy that 
can be stored in the battery. 

B. Markov Decision Process formulation 

To solve this problem we are going to formulate it as 
an infinite horizon Markov Decision Process (MDP). The 
state space, S, will be two-dimensional, a typical state 
would be {C,h), which represents the current energy in 
the battery and the current channel-fade coefficient. From 
this the size of the state space will be |5| = + 
(note that energy in the battery can be 0). Valid action 
space (power allocation) for the state h) will be P e 
{0, 1, . . . , ^}, this is because at any time we can at most 
allocate all the power available in the battery and also 
that we can also choose to allocate zero power (using this 
the (•)"'" sign in the system equation becomes redundant). 
Union of all action spaces will be ^ = {0, 1, . . . , Cm}- 
We will consider discounted rewards with a constant 
discount factor A G (0, 1). 



Our reward function r : S x A 



r((e,/i),P) = log 1 + 



► 7^([ is 

hp 



Now we define optimal reward function J* : 5 — ^ 

TZq as the optimal value for each state that we start 
with. Transition ProbabiUty Matrix (TPM), [P{(Co, ^o) I 
h), P}] , represents the probability of getting to some 
state {Co, ho) starting from {C, h) and taking action P. 
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Using all of the above we can write the Bellman's 
equation of dynamic programming as 

J^(e,/.)=max{log(l + ^) + 

5„. N 

^ E ^n{^o,ho)\{^,h),P}xr{^o,ho) \ 

we will write this succinctly as (using s = h) as state) 

r{s) = max {r(s, P) + AE^„ ( J*(/(s, P), ho)) } (2) 

here / represents the rhs in ([l]). 

Policy for this system will be map from state space 
to action space for each epoch, but as this is an infinite 
horizon MDP we will only look at Stationary Deter- 
ministic Policies to get the maximum throughput. So 
the optimal policy for our problem will be of the form 
vr* = {/i*, . . .} and for convenience lets call it policy 
/i*. So we can write the equation for optimal decision 
rule fi* : S ^ A succinctly as follows 



(s) = arg max {r(s, P) + AE^„ ( J*(/(s, P), ho)) } . 



With this our formulation of this problem is done and 
now we can move towards some of the results. 



III. Results 

Here we prove structural results about monotonicity 
of J* and /i* for our optimal power allocation problem 
which we have formulated as an MDP. 

In the previous section we wrote the Bellman's Equa- 
tion for our MDP and one way to solve it is using Value 
Iteration procedure (refer to the book by Bertsekas fTT|). 
For this we start with an initial value (estimate) for the 
optimal reward function, say Jo(s) = V s G 5 and 
then write iteration equations as 

Jfc+i(s) = max{r(s,P) + AE^,(jfe(/(s,P),/io))} (3) 

where s = From the theory of infinite horizon 

discounted reward MDP problems we know that this will 
converge (to J*) under the condition of bounded reward 
per stage (which is satisfied by the reward function in 
our case, the reward function is bounded and the action 
space and state space are all finite due to discrete nature 
of our formulation). 



A. Preliminary Results 

Here we will state and prove lemmas which will be 
required later to prove the main theorem. 

Lemma 1 (Monotone Optimal Reward Function). The 

optimal Reward Function, J*{(,, h), is non-decreasing in 
both arguments. We have two parts in this, 

1) For any ^ G {0, . . .^rn], 

J*iC, h^) > J*{C, h') where h+ > h' , 

2) For any h £ Ti, 

> J*{r,h) where > T ■ 

Proof: ( Part 1) Take any ^ and consider channel 
states and /i+ where /i+ > h^. Notice that as 
the channel process is i.i.d., the channel transitions are 
independent of each other. Specifically, we can say that 
the future channels are independent of current channel 
state, so the second term in ^ for J*(^,/i+) and 
J*{^,h~) will be identical (as a function of P). Take 
P~ = ij,*{^,h~), by using ^ at this power we have 

r{^,h+)-r{c,h') > 



log 1 + 



h+p- 
NoW 



Proof: (Part 2) Take any h and consider ^+ and 
where ^+ > C~ ■ Starting the value iteration with Jo(s) = 
V s G S* we will use induction to prove our result (for 
every step of value iteration). The base case is vacuously 
true. Now we assume that Jk{^,h) is non-decreasing in 
^. Let P^ maximize the r.h.s of (p|) for the state (^~, h). 
From our iteration equations we have at power P = P^ 
and for D = Jk+i{^^,h) - Jk+i{Ch) 

D > XEl [Jkifit, P), ho) - JkifiCP), ho)] (5) 

Since ^+ > C , then for the same power P^ , we'll have 
f{C~^) > f{^ ) (for every instance of X). By induction 
hypothesis Jk{£,,h) is non-decreasing in ^, hence the 
term inside the expectation in Q is non-negative (for 
every instance of X and h). Hence after taking the 
expectation we will have 

Jk+i{C^,h) > Jk+i{r,h) 

using induction now we can claim the above V A; G Z+ 
and hence the result follows by taking limk^^. ■ 
The above lemma can be effectively written as 

j*{c+,h+)> j*{r,h-) ve+>r, h+>h~ 

Now that we have shown monotonically increasing 
nature of optimal reward function, another property that 
will go a long way in proving our final result is that 
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of concavity of J*. Typically concavity (convexity) and 
equivalently sub-modularity (super-modularity) has been 
the most used method to prove monotonicity of policy. 
So here with the help of a little extra set up we prove the 
important property of concavity of J* in energy only. 

Lemma 2 (Concave Optimal Reward Function). The 

optimal reward function J*{^,h) is concave in 5, for 
a fixed h. 

Proof: Here we will use induction on Value iteration 
steps, just like before. We will first show that concavity 
in Jfc implies concavity in Jfe+i. Assuming is concave 
we take states as 

where ^ = a£,i + (1 — a)S,2 (0 < a < 1). Now taking 
the optimal powers for this step of the iteration as Pi 
and P2 we can write the equations 

Jk+i{si) = r{suPi) + XKl[Jk{f{suPi),ho)] 

Jk+i{s2) = r{s2,P2) + X&t [Jk{f{s2,P2), ho)] 

We know that log(-) reward here is a concave function 
in P and is constant w.r.t variation in ^, hence we have 

ar{si,Pi) + {l-a)r{s2,P2) <r{s,P) (6) 

where P = aPi + {1 — a)P2 and s can be used because 
it has the same channel coefficient, h, as si and S2 ■ By 
induction hypothesis J^, is concave as well, so 

aJkif{si,Pi), ho) + (1 - a)Jkif{s2, P2), ho) 
< Jk{af{si, Pi) + (1 - a)f{s2, P2), ho) (7) 

Beyond this point we divide the problem into cases, 
depending on the values of X. 

Case 1: All X, such that f{si, Pi), f{s2, P2) < Cm- 

^ Q/(si,Pi) + (l-a)/(s2,P2) 

= + (1 - a)6 - {aPi + (1 - a)P2) + X 

= ^-P + X = f{s,P) 

The last equality follows since the argument in this case 
is clearly < Hence continuing from ^ we can write 

aJkif{si,Pi), ho) + (1 - a)Jk{f{s2, P2),ho) 

<Jk{f{s,P),ho) (8) 

Using ([6]) and ([8]) we can thus write 

aJk+i{si) + (1 - a)Jk+i{s2) 

< r{s,P)+XJk{f{s,P),ho) (9) 

Case 2: All X, such that /(si, Pi) = U = f{s2, P2). 

^ a(ei - Pi +X) + (1 - a)(6 -P2 + X)>Cm 



so /(s, P) = Cm and hence we can write 

Jfc(a/(si, Pi) + (1 - a)f{s2, P2),ho) 

= Jk (Cm, ^0) = Jk {f{s, P),ho) 

from this the same result as in ([9]) follows. 

Case 3: All X, such that f{s2, P2) < /(si, Pi) = Cm- 

^2 - P2 + X < Cm = Ci - Pi + ^ - /? (/3 > 0), 

af{si,Pi) + {l-a)f{s2,P2)=^-P + X-ap (10) 

Clearly the term in the r.h.s in ([10]) is less than Cm and 
it also is < (C - P + X) so we can conclude 

C-P + X-a/3< min{C -P + X, Cm} 

Since Jk is non-decreasing in energy (shown in the proof 
of Lemma [T) we can conclude the same as in ([8]) and 
from there (|9]l as well. Cases finished. 

From these three cases what we have seen that (|9]l 
is satisfied for all ho and all possible values of X and 
hence we can introduce the E(-) operator and conclude 

aJk+i{si) + (1 - a)Jk+i{s2) < r (s, P) + 



AE 



Jk{f{s,P),ho) <Jk+i{s) 



where the last inequality holds because P can generate 
a value only less that or equal to the optimal value for 
state s (at the {k + 1)*'* iteration). 

Now from all this we have shown that concavity in Jk 
implies concavity in Jk+i and starting with a concave 
initial value of the iteration like Jo(s) = V s G 5, 
we can conclude by induction that Jk is concave in C 
V /c G Z+. Hence as Value iteration converges we can 
conclude that J* is concave in C- ■ 

Corollary 1. If we have energy levels x < w < z < y 
such that 

x + y = w + z then (11) 
J*{x,h) + J*{y,h) < J*{w,h) + J*{z,h) 

Proof: For a fixed h define J*(C,/i) = g{0- Also 
let Ag{i) = g{i + 1) — g{i) , then we can write 

g{x, h) + g{y, h) = 2g{x) + J] Ag{i) 



to— 1 



z-1 



g{w, h) + g{z, h) = 2g{x) + ^ Ag{i) + ^ Ag{i) 

i=x i=x 

As J* is concave in energy, we know that Ag{i) is non- 
increasing with i (following the "Law of diminishing 
returns" for concave functions). Summations in both 
equations above have the same number of terms (due 



to (111) and clearly the first equation sums Ag{i) over 
higher values of i and therefore is smaller. ■ 
This property is called sub-modularity. 
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B. Main Structural Result 

Now we prove the main structural result with the aid 
of the lemmas of previous subsection. 

Theorem 1 (Monotonic Optimal Policy). The optimal 
policy of power allocation, is non-decreasing 

in both arguments. We have two parts in this, 

1) For any ^ S {0, . . . Cm}, 

h+) > h-) where /i+ > 

2) For any h & Ti, 

li* {C^,h)> n*{C,h) where C+ > T 

Proof: ( Part 1) Consider two channel states /i^ and 
/i+ where > . We can write 



but we can write P{X > Cm — ^ + P) terms of the 
summation preceding it, hence we will have 



max 



h^)=arg max |log^l + 



+ loK 1 + 



h+P 
NnW 



-loe 1 + 



h-p 



h-p 
NqW 



+ A 



Since the last term is independent of /i+ we have 

= maxp<5{Ti + T2} where 

/ h+P\ ( h-p 
Ti = los; IH loa; IH 

and T2 is the full term that will appear inside the max 
operator in the expression for n*{C,h-), which means 
that T2 achieves its maximum at P^- = 
Notice that Ti is monotonically increasing in P, since 

dTi NoW{h+ - h-) 



X [r{u, ho) - r{c -p + i, ho)] } (13) 

Now we will use contradiction to prove our result i.e. 
assume that there exists states ^1 > ^2 with optimal 
powers Pi < P2. 

Let Jp{C,h) represents the rhs term in (j2]), evaluated 
at power P. Then due to optimality of P2 with ^2 and 
Pi with ^1 we will have the equations 

Jp,(6,/i)- Jp,(6,/i) >o, 

Jp,{Ci,h)-Jp,{Cuh)>0 

Adding the two equations with the help of ( [T3] ) and using 
g{C) = J*{C, h) as well as pi = ¥{X = i} will give us 



^PiA{i)+ PiB{i)+ 

i=Kii+l 



i=0 



j = Kl2 + l j=K21 + l 



> 



(14) 



> 



(12) 



dP {NoW + h+P){NoW + h-P) 

Considered at any P < Ph-, the term Ti will have 
a value lesser than at Ph- (because its monotonically 
increasing) and same for T2 (because maxima is at Ph-)- 
Hence {TI + T2} cannot achieve its maxima for any 
P < Ph- and we conclude 

IJ^* {C,h+)>fi*{C,h-) 

Proof: ( Part 2) Firstly note that 

6 < u ^ p{6 \ c,p} = nx = ^2-^+ p} 
i2 = u^ \i,p} = nx>u-i + p} 

From the above now we can write the second term in 
J* as 

U N 

Y Y ^^^0} X nco \tp}^ r{Co,ho) = 

5o=C-P h„=l 

Y nx = i}xr{c-p + i,ho) + 



nx>u-c+p}x ho) 



for Kij = Cm- Vij - 1 , Vij = {(.i - Pj) i,j G {1, 2} 

Mi) = 9{yu + i)+9{y22 + i)-g{yi2 + i)-g{y2i + i), 
B{i) = aiCm) + g{y22 + i) - g{yi2 + i) - 5(^21 + i), 
C{i) = g{y22 + i)- g{y2i + i), 
P>ii) = -giU) + g{y22 + i)- 

In breaking the above summations appropriately we have 
assumed w.l.o.g. K12 < K21, which means ku < K12 < 

K21 < K22 & yii > yi2 > y2i > 2/22- 



E 



We will argue that ( 14 1 is a contradiction. Our follow- 
ing calculations hold for every h. 

Simply by our construction 2/22 < 2/12,2/21 < 2/ii and 

2/11 + 2/22 = (6 + 6) - {Pi + P2) = 2/21 + 2/12 

so by Corollary [T| A{i) < V i. We know that g is 
non-decreasing (Lemma 1). As 1/22 > 2/21 we'll have 
C{i) < V i. Since the range of summation for D{i) is 
such that 1/22 + ^ < we also have D{i) < V i. 

Now looking at B{i), define successive differences 
Ag{l) = g{l + 1) — g{l) (using the same method as in 
Corollary [T]). Due to concavity of J* (Lemma 2) this 
is non-increasing. We can express g{Cm), g{yi2 + i) and 
5(2/21 + as a summation of Ag starting from g{y22+i)- 
We will then see here that g{Cm) + 5(2/22 + i) has fewer 
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terms in summation compared to 5(^12+^+5(2/21 + 
i) and those Ag{l) terms are also smaller since they are 
being summed over higher I. Since Ag is positive we 
can conclude that B{i) < V i. 

So from all this we have shown that all terms in 



(14 1 are negative V h and thus when their expectation 
is taken, it will be negative too. Thus we have shown a 
contradiction. Hence proved. ■ 
The above result can be concisely written as 



h = 5 
h = 15 



ENERGY STATE 



^l^C^,h+) > Ve+>r, h+>h~ Fig- 1- vs. Cfor/i = 5,15 



C. Possible Generalizations 

In this problem we had compact support on X and 
^. Note that as long as we have compact support for 
these two, the results will carry through to uncountable 
state/action space as well. Meaning, instead of having 
discrete values of ^ and X, we can make it continuous 
(over real numbers) and end up with the same results. 

The reward function used here was log, we can enlist 
the following properties that were used explicitly in 
proving our results 

1) reward (r) depends only on h and P, its indepen- 
dent of ^ (used in Lemma [T] part 1), 
r{{^, h),P) is concave in P (used in Lemma |2]), 



2) 
3) 



4) 



drii^,h),P) 
dh 



d^m,h),p) 



> (used in Q). 



>0 (used in (12)). 



dPdh 

No other property of log function was used. This means 
that any reward function satisfying these three proper- 
ties will give us the same results. (Reward function is 
assumed to be positive for all state/action pairs) 

IV. Simulation Results 

We present here simulation results which essentially 
verify our results (the properties proved here were ver- 
ified for a large number of parameters before being 
proved). 

We take the parameters in the problem as 



50 



56 



A = 0.85 



N =17 



and NqW = 10. This means that the channel states are 
in T-L = {1, . . . , 17}. The distribution h is taken to be 
bell-shaped and distribution on X was taken to be a 
strictly decreasing one. For this system we first plot the 
optimal policy n*{^,h), (which we have proved to be 
non-decreasing in both ^ and h), 




ENERGY STATE 



Fig. 2. J*(^, h) vs. C for /i = 5, 10 



and then the optimal reward function J*(^, h), which 
should not only be non-decreasing in both arguments but 
also concave in ^. 

Apart from verifying our proven results another im- 
portant feature to discuss is the structure of the random 
power being added in every slot i.e. distribution of X. 
Higher power added in every slot should give us higher 
optimal powers to work with, since even if we spend 
power on a bad channel once, we wouldn't have to 
wait long before the battery gets recharged (since higher 
values of X are more likely). In this regard we also 
present here the graph of /i* for 2 different distributions 
on X. represents a distribution which decreases with 
X (this is also the distribution we have been using till 
now) and represents a distribution which is exactly 
inverted i.e. it increases with x. Clearly has higher 
mean that Pxi- 

As an instructive example we can also look at the 
solution after varying A, variation in A is of central 
importance because it essentially tells us how much 
importance is being given to future rewards as opposed to 
the current reward, which basically dictates the average 
number of recharge cycles that the battery may have to 
go through (and consequently its effective life-time). 

We notice in our case that as A increases more impor- 
tance is given to future rewards and consequently optimal 
powers become lower i.e. power is being saved for future 
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ENERGY STATE 



Fig. 3. h) vs. ^ for P^i , Px^ and h=W 

where probably better channels may be available. 

45 1 1 1 1 1 1 1 1 1 1 

40- 

35 - I 1 



a: 0.5 




5 10 15 20 25 30 35 40 45 50 



ENERGY STATE 

Fig. 4. h) vs. ^ for A = 0.5, 0.85, 0.9 and /i = 15 



V. Conclusion 

In this paper we have proved one of the most important 
features of the power allocation problem constrained 
under limited capacity of the battery. The results have 
been proved from scratch without the use of any known 
results except the standard ones for a general MDP 
setting. The most pleasing aspect of this result is that 
there were no assumptions required on the distribution 
of X and h, just that their respective processes are i.i.d.. 
Along with the main result, the side results like the 
monotone and concave nature of J* are also important 
tools in deciding a minimum complexity algorithm. 

Once we have a monotonically increasing optimal 
policy then not only does the search space for any 
algorithm gets reduced but also the memory required 
to store the related tables gets reduced, which is very 
much desirable as the sensors are quite small in size. 
The policy here is an off-line policy. 

The other results being looked into are that of finding 
an actual algorithm that will take full advantage of the 
results proved here. Further work that is going on is 
for the case of unknown channel process, in which 
case Q-leaming methods need to be looked into and 



possibly an on-line pohcy can be determined. Another 

possibility is that of {X„}„>i process being dependent 
on state, which actually is a reahstic scenario in capacitor 
charging models given for solar cells. 
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